Pesquisa | Portal Regional da BVS

1.

Interpreting Cis-Regulatory Interactions from Large-Scale Deep Neural Networks for Genomics.

Toneyan, Shushan; Koo, Peter K.

bioRxiv ; 2024 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-37461616

RESUMO

The rise of large-scale, sequence-based deep neural networks (DNNs) for predicting gene expression has introduced challenges in their evaluation and interpretation. Current evaluations align DNN predictions with experimental perturbation assays, which provides insights into the generalization capabilities within the studied loci but offers a limited perspective of what drives their predictions. Moreover, existing model explainability tools focus mainly on motif analysis, which becomes complex when interpreting longer sequences. Here we introduce CREME, an in silico perturbation toolkit that interrogates large-scale DNNs to uncover rules of gene regulation that it learns. Using CREME, we investigate Enformer, a prominent DNN in gene expression prediction, revealing cis-regulatory elements (CREs) that directly enhance or silence target genes. We explore the intricate complexity of higher-order CRE interactions, the relationship between CRE distance from transcription start sites on gene expression, as well as the biochemical features of enhancers and silencers learned by Enformer. Moreover, we demonstrate the flexibility of CREME to efficiently uncover a higher-resolution view of functional sequence elements within CREs. This work demonstrates how CREME can be employed to translate the powerful predictions of large-scale DNNs to study open questions in gene regulation.

2.

Current approaches to genomic deep learning struggle to fully capture human genetic variation.

Tang, Ziqi; Toneyan, Shushan; Koo, Peter K.

Nat Genet ; 55(12): 2021-2022, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-38036789

3.

EvoAug: improving generalization and interpretability of genomic deep neural networks with evolution-inspired data augmentations.

Lee, Nicholas Keone; Tang, Ziqi; Toneyan, Shushan; Koo, Peter K.

Genome Biol ; 24(1): 105, 2023 05 05.

Artigo em Inglês | MEDLINE | ID: mdl-37143118

RESUMO

Deep neural networks (DNNs) hold promise for functional genomics prediction, but their generalization capability may be limited by the amount of available data. To address this, we propose EvoAug, a suite of evolution-inspired augmentations that enhance the training of genomic DNNs by increasing genetic variation. Random transformation of DNA sequences can potentially alter their function in unknown ways, so we employ a fine-tuning procedure using the original non-transformed data to preserve functional integrity. Our results demonstrate that EvoAug substantially improves the generalization and interpretability of established DNNs across prominent regulatory genomics prediction tasks, offering a robust solution for genomic DNNs.

Assuntos

Genômica , Redes Neurais de Computação , Genômica/métodos

4.

ETV6 dependency in Ewing sarcoma by antagonism of EWS-FLI1-mediated enhancer activation.

Gao, Yuan; He, Xue-Yan; Wu, Xiaoli S; Huang, Yu-Han; Toneyan, Shushan; Ha, Taehoon; Ipsaro, Jonathan J; Koo, Peter K; Joshua-Tor, Leemor; Bailey, Kelly M; Egeblad, Mikala; Vakoc, Christopher R.

Nat Cell Biol ; 25(2): 298-308, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-36658219

RESUMO

The EWS-FLI1 fusion oncoprotein deregulates transcription to initiate the paediatric cancer Ewing sarcoma. Here we used a domain-focused CRISPR screen to implicate the transcriptional repressor ETV6 as a unique dependency in this tumour. Using biochemical assays and epigenomics, we show that ETV6 competes with EWS-FLI1 for binding to select DNA elements enriched for short GGAA repeat sequences. Upon inactivating ETV6, EWS-FLI1 overtakes and hyper-activates these cis-elements to promote mesenchymal differentiation, with SOX11 being a key downstream target. We show that squelching of ETV6 with a dominant-interfering peptide phenocopies these effects and suppresses Ewing sarcoma growth in vivo. These findings reveal targeting of ETV6 as a strategy for neutralizing the EWS-FLI1 oncoprotein by reprogramming of genomic occupancy.

Assuntos

Sarcoma de Ewing , Criança , Humanos , Sarcoma de Ewing/genética , Sarcoma de Ewing/metabolismo , Sarcoma de Ewing/patologia , Linhagem Celular Tumoral , Regulação Neoplásica da Expressão Gênica , Proteína EWS de Ligação a RNA/genética , Proteína EWS de Ligação a RNA/metabolismo , Proteína Proto-Oncogênica c-fli-1/genética , Proteína Proto-Oncogênica c-fli-1/metabolismo , Proteínas de Fusão Oncogênica/genética , Proteínas de Fusão Oncogênica/metabolismo

5.

Selecting deep neural networks that yield consistent attribution-based interpretations for genomics.

Majdandzic, Antonio; Rajesh, Chandana; Tang, Amber; Toneyan, Shushan; Labelson, Ethan; Tripathy, Rohit; Koo, Peter K.

Proc Mach Learn Res ; 200: 131-149, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37205975

RESUMO

Deep neural networks (DNNs) have advanced our ability to take DNA primary sequence as input and predict a myriad of molecular activities measured via high-throughput functional genomic assays. Post hoc attribution analysis has been employed to provide insights into the importance of features learned by DNNs, often revealing patterns such as sequence motifs. However, attribution maps typically harbor spurious importance scores to an extent that varies from model to model, even for DNNs whose predictions generalize well. Thus, the standard approach for model selection, which relies on performance of a held-out validation set, does not guarantee that a high-performing DNN will provide reliable explanations. Here we introduce two approaches that quantify the consistency of important features across a population of attribution maps; consistency reflects a qualitative property of human interpretable attribution maps. We employ the consistency metrics as part of a multivariate model selection framework to identify models that yield high generalization performance and interpretable attribution analysis. We demonstrate the efficacy of this approach across various DNNs quantitatively with synthetic data and qualitatively with chromatin accessibility data.

6.

Evaluating deep learning for predicting epigenomic profiles.

Toneyan, Shushan; Tang, Ziqi; Koo, Peter K.

Nat Mach Intell ; 4(12): 1088-1100, 2022 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37324054

RESUMO

Deep learning has been successful at predicting epigenomic profiles from DNA sequences. Most approaches frame this task as a binary classification relying on peak callers to define functional activity. Recently, quantitative models have emerged to directly predict the experimental coverage values as a regression. As new models continue to emerge with different architectures and training configurations, a major bottleneck is forming due to the lack of ability to fairly assess the novelty of proposed models and their utility for downstream biological discovery. Here we introduce a unified evaluation framework and use it to compare various binary and quantitative models trained to predict chromatin accessibility data. We highlight various modeling choices that affect generalization performance, including a downstream application of predicting variant effects. In addition, we introduce a robustness metric that can be used to enhance model selection and improve variant effect predictions. Our empirical study largely supports that quantitative modeling of epigenomic profiles leads to better generalizability and interpretability.

7.

Deconvolution of expression for nascent RNA-sequencing data (DENR) highlights pre-RNA isoform diversity in human cells.

Zhao, Yixin; Dukler, Noah; Barshad, Gilad; Toneyan, Shushan; Danko, Charles G; Siepel, Adam.

Bioinformatics ; 37(24): 4727-4736, 2021 12 11.

Artigo em Inglês | MEDLINE | ID: mdl-34382072

RESUMO

MOTIVATION: Quantification of isoform abundance has been extensively studied at the mature RNA level using RNA-seq but not at the level of precursor RNAs using nascent RNA sequencing. RESULTS: We address this problem with a new computational method called Deconvolution of Expression for Nascent RNA-sequencing data (DENR), which models nascent RNA-sequencing read-counts as a mixture of user-provided isoforms. The baseline algorithm is enhanced by machine-learning predictions of active transcription start sites and an adjustment for the typical 'shape profile' of read-counts along a transcription unit. We show that DENR outperforms simple read-count-based methods for estimating gene and isoform abundances, and that transcription of multiple pre-RNA isoforms per gene is widespread, with frequent differences between cell types. In addition, we provide evidence that a majority of human isoform diversity derives from primary transcription rather than from post-transcriptional processes. AVAILABILITY AND IMPLEMENTATION: DENR and nascentRNASim are freely available at https://github.com/CshlSiepelLab/DENR (version v1.0.0) and https://github.com/CshlSiepelLab/nascentRNASim (version v0.3.0). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Isoformas de RNA , RNA , Humanos , Isoformas de RNA/genética , Software , Isoformas de Proteínas/genética , Análise de Sequência de RNA/métodos , Fatores de Iniciação em Eucariotos/genética

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA